Goto

Collaborating Authors

 lower comparison system




Deep Q-Learning with Gradient Target Tracking

Lee, Donghwan, Park, Bum Geun, Lee, Taeho

arXiv.org Artificial Intelligence

This paper introduces Q-learning with gradient target tracking, a novel reinforcement learning framework that provides a learned continuous target update mechanism as an alternative to the conventional hard update paradigm. In the standard deep Q-network (DQN), the target network is a copy of the online network's weights, held fixed for a number of iterations before being periodically replaced via a hard update. While this stabilizes training by providing consistent targets, it introduces a new challenge: the hard update period must be carefully tuned to achieve optimal performance. To address this issue, we propose two gradient-based target update methods: DQN with asymmetric gradient target tracking (AGT2-DQN) and DQN with symmetric gradient target tracking (SGT2-DQN). These methods replace the conventional hard target updates with continuous and structured updates using gradient descent, which effectively eliminates the need for manual tuning. We provide a theoretical analysis proving the convergence of these methods in tabular settings. Additionally, empirical evaluations demonstrate their advantages over standard DQN baselines, which suggest that gradient-based target updates can serve as an effective alternative to conventional target update mechanisms in Q-learning.


Finite-Time Error Analysis of Soft Q-Learning: Switching System Approach

Jeong, Narim, Lee, Donghwan

arXiv.org Artificial Intelligence

Soft Q-learning is a variation of Q-learning designed to solve entropy regularized Markov decision problems where an agent aims to maximize the entropy regularized value function. Despite its empirical success, there have been limited theoretical studies of soft Q-learning to date. This paper aims to offer a novel and unified finite-time, control-theoretic analysis of soft Q-learning algorithms. We focus on two types of soft Q-learning algorithms: one utilizing the log-sum-exp operator and the other employing the Boltzmann operator. By using dynamical switching system models, we derive novel finite-time error bounds for both soft Q-learning algorithms. We hope that our analysis will deepen the current understanding of soft Q-learning by establishing connections with switching system models and may even pave the way for new frameworks in the finite-time analysis of other reinforcement learning algorithms.


Finite-Time Analysis of Simultaneous Double Q-learning

Na, Hyunjun, Lee, Donghwan

arXiv.org Artificial Intelligence

Q-learning is one of the most fundamental reinforcement learning (RL) algorithms. Despite its widespread success in various applications, it is prone to overestimation bias in the Q-learning update. To address this issue, double Q-learning employs two independent Q-estimators which are randomly selected and updated during the learning process. This paper proposes a modified double Q-learning, called simultaneous double Q-learning (SDQ), with its finite-time analysis. SDQ eliminates the need for random selection between the two Q-estimators, and this modification allows us to analyze double Q-learning through the lens of a novel switching system framework facilitating efficient finite-time analysis. Empirical studies demonstrate that SDQ converges faster than double Q-learning while retaining the ability to mitigate the maximization bias. Finally, we derive a finite-time expected error bound for SDQ.


Finite-Time Analysis of Minimax Q-Learning for Two-Player Zero-Sum Markov Games: Switching System Approach

Lee, Donghwan

arXiv.org Artificial Intelligence

The objective of this paper is to investigate the finite-time analysis of a Q-learning algorithm applied to two-player zero-sum Markov games. Specifically, we establish a finite-time analysis of both the minimax Q-learning algorithm and the corresponding value iteration method. To enhance the analysis of both value iteration and Q-learning, we employ the switching system model of minimax Q-learning and the associated value iteration. This approach provides further insights into minimax Q-learning and facilitates a more straightforward and insightful convergence analysis. We anticipate that the introduction of these additional insights has the potential to uncover novel connections and foster collaboration between concepts in the fields of control theory and reinforcement learning communities.


Finite-Time Analysis of Asynchronous Q-learning under Diminishing Step-Size from Control-Theoretic View

Lim, Han-Dong, Lee, Donghwan

arXiv.org Artificial Intelligence

Q-learning has long been one of the most popular reinforcement learning algorithms, and theoretical analysis of Q-learning has been an active research topic for decades. Although researches on asymptotic convergence analysis of Q-learning have a long tradition, non-asymptotic convergence has only recently come under active study. The main goal of this paper is to investigate new finite-time analysis of asynchronous Q-learning under Markovian observation models via a control system viewpoint. In particular, we introduce a discrete-time time-varying switching system model of Q-learning with diminishing step-sizes for our analysis, which significantly improves recent development of the switching system analysis with constant step-sizes, and leads to \(\mathcal{O}\left( \sqrt{\frac{\log k}{k}} \right)\) convergence rate that is comparable to or better than most of the state of the art results in the literature. In the mean while, a technique using the similarly transformation is newly applied to avoid the difficulty in the analysis posed by diminishing step-sizes. The proposed analysis brings in additional insights, covers different scenarios, and provides new simplified templates for analysis to deepen our understanding on Q-learning via its unique connection to discrete-time switching systems.


Finite-Time Analysis of Asynchronous Q-Learning with Discrete-Time Switching System Models

Lee, Donghwan

arXiv.org Artificial Intelligence

This paper develops a novel framework to analyze the convergence of Q-learning algorithm from a discrete-time switching system perspective. We prove that asynchronous Q-learning with a constant step-size can be naturally formulated as discrete-time stochastic switched linear systems. It offers novel and intuitive insights on Q-learning mainly based on control theoretic frameworks. For instance, the proposed analysis explains the overestimation phenomenon in Q-learning due to the maximization bias. Based on the control system theoretic argument and some nice structures of Q-learning, a new finite-time analysis of the Q-learning is given with a novel error bound.